Attention in Attention: Modeling Context Correlation for Efficient Video Classification
نویسندگان
چکیده
Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to utilization perspective contexts. However, current research on attention generally focuses adopting a specific aspect contexts (e.g., channel, spatial/temporal, or global context) refine features and neglects their underlying correlation when computing attentions. This leads incomplete context hence bears weakness limited improvement. To tackle problem, this paper proposes an efficient attention-in-attention (AIA) method for element-wise feature refinement, which investigates feasibility inserting channel into spatio-temporal learning module, referred as CinST, also its reverse variant, STinC. Specifically, we instantiate dynamics aggregated along axis with average max pooling operations. The workflow AIA module is that first block uses one kind information guide gating weights calculation second targets at other context. Moreover, all computational operations in units act pooled dimension, results quite few cost increase (< 0.02%). verify our method, densely integrate it two classical network backbones conduct extensive experiments several standard benchmarks. source code available https://github.com/haoyanbin918/Attention-in-Attention .
منابع مشابه
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
Recently, substantial research effort has focused on how to apply CNNs or RNNs to better extract temporal patterns from videos, so as to improve the accuracy of video classification. In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets. We investigate the potential ...
متن کاملAdvancing Connectionist Temporal Classification With Attention Modeling
In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network r...
متن کاملOn the Use of Spatiotemporal Visual Attention for Video Classification
It is common sense among experts that visual attention plays an important role in perception, being necessary for obtaining salient information about the surroundings. It may be the “glue” that binds simple visual features into an object [1]. Having proposed a spatiotemporal model for visual attention in the past, we elaborate on this work and use it for video classification. Our claim is that ...
متن کاملA spatiotemporal model with visual attention for video classification
High level understanding of sequential visual input is important for safe and stable autonomy, especially in localization and object detection. While traditional object classification and tracking approaches are specifically designed to handle variations in rotation and scale, current state-of-the-art approaches based on deep learning achieve better performance. This paper focuses on developing...
متن کاملSocial Attention: Modeling Attention in Human Crowds
Robots that navigate through human crowds need to be able to plan safe, efficient, and human predictable trajectories. This is a particularly challenging problem as it requires the robot to predict future human trajectories within a crowd where everyone implicitly cooperates with each other to avoid collisions. Previous approaches to human trajectory prediction have modeled the interactions bet...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Circuits and Systems for Video Technology
سال: 2022
ISSN: ['1051-8215', '1558-2205']
DOI: https://doi.org/10.1109/tcsvt.2022.3169842